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"METHOD OF BUILDING A VARIABLE-LENGTH ERROR CODE" 



FXELD OF THE INVENTION 

The present Invention relates to a method of building a variable length error 
code and to a corresponding device. 

BACKGROUND OF THE INVENTION 

A classical communication chain, illustrated in Fig.l, comprises a source 
encoder followed by a channel encoder and, after the transmission of the coded signals 
thus obtained, a channel decoder and a source decoder. Variable-length codes (VLC) are 
dasdcally used In source coding ftor their compression capabilities, and the assodated 
channel coding techniques combat the effects of the real transmission channel (such as 
fading, noise, etc.). However, since source coding Is Intended to remove redundancy and 
channel coding to re-introduce It, It has been investigated how to effidently coordinate 
these techniques in order to Improve the overall system while keeping the complexity at an 
acceptable level. 

Among the solutions proposed In such an approach, the variable-length error 
conrecting (VLEC) codes present the advantage to be variable-length while providing error 
correction capabilities, but building these codes is rather time consuming for short 
alphabets (and become even prohibitive for higher length alphabets sources), and the 
construrtion complexity is also a drawbacic, as It will be seen. 

First, some definitions and properties of the dassical VLC must be recalled. A 
code C is a set of S codewords {d, Cz, Cj,..., Cs}, for each of which a length t, = id is 

defined, with hshs la^ ^ti< < 4 without any loss of generality. The number of 

different codeword lengths in the code C is called R, with R < S, and these lengths are 

denoted as Lu L2, U, ,1,, Lr, with Li < Lz < L3 < < U. A variable-length code, or 

VLC, Is then the structure denoted by (si@ Lu Sz® Lj, 33® L3, , Sr@ Lr), which 

corresponds to Si codewords of length Li, S2 oodeworels of length L2, S3 codewords of 

length L3, , and Sr codewords of length U. When using a VLC, the compression 

effidency, for a given source, is related to the number of bits necessary to transmit 
symbols from said source, and the measure used to estimate this effidency Is often the 
average length AL of the code (I.e. the average number of bits needed to transmit a wonj), 
given, when each symbol a. Is mapped to the codeword q, by the following relation (1) ; 

i«=s 

AL=2^.P(ai) (1) 



Which is equivalent to the relation (2) : 
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R j«t(i+l) 

where, for a data source A, the s source symbols are denoted by {ai, 82, 33,,...., as} and 
P(ai) Is the respective probability of occurrenffi of each of these symbols, with Z P(aO - 1 
(from I = 1 to I = s)» If Almin denotes the minimal value for the average length AL, it Is easy 
5 to see that when Alnrfn is reached, the symbols are Indexed in such a way that P(aO > P(ai) 

> PCaa) > > p(as). In order to encode the data in such a way that the recdver can 

decode the coded information, the VLC must satisfy the following properties : to be non- 
singular (all the codewords are distinct, no more than one source symbol being therefore 
allocated to one codeword) and to be uniquely decodable (Le. it is possible to map any 
10 string of codewords to the correct source symbols string without any error). 

An introduction and a presentation of different distances will then help to 
recall the notion of error-correcting property used in the VLEC codes theory : 

(a) Hamming weight and distance : if w is a word of length n with w = (wi, 
W2,-.-, Wn), the Hamming weight of w is the number W(w) of non-zero symbols in w : 



and, if Wi and W2 are two words of equal length with W| = (wn, W12, Wb,...., w^,) and i =5 1 or 
2, the Hamming distance between Wi and wz is the number of positions in which Wi and Wa 
diflfer : 

H(Wi, W2) =s W(Wx-W2) (4) 
20 However, the Hamming distance is by definition restricted to fixed-length codes, and other 

definitions will be defined before considering VLEC codes. 

(b) let f| = w/ wi wi be a concatenation of n words of a VLEC code C, 
then the set Fn » {f| : |f,l = N} is called the etended code of C of order N. 

(c) minimum block distance and overall minimum block distance : the 
25 minimum block distance bk associated to the codeword length L^ of a VLEC code C is 

defined as ttie minimum Hamming distance betv^een all distinct codewords of C with length 
Lk: 

bk = min {H(cr, q) : q, q e C, I ^ j, \q\ ^ |q| for k = 1,.,., R (5) 
and the overall minimum blodc distance bmm of said VLEC code C, which Is the minimum 
30 block distance value for every possible length U, Is defined by : 

bmh = min {bk : k = 1,... R} (6) 

(d) diverging distance and minimum diverging distance : the diverging 
distance between two codewords of dHferent length q = Xj^ Xj^ ..-^i,. and 

9i = Xji Xj2 Xj^j of a VLEC code C, where |q| = > |q|= «j. Is defined by : 

2^ ^ fe/ q) = H(x^ x,2 .... X|^. , x^ Xj2 X j^. ) (7) 
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i.e. it Is also the Hamming distance between a «j - length codeword and the ?j - length 
prefix of a longer codewond, and the minimunn diverging distance dmm of said VLEC code C 
is the minimum value for all diverging distances between every possible couple of 
codewords of C of different length : 

dmin = tX\\n i D(Ci ,c j) : cj .cj e C, [ci] ^ |cj| } (8) 

(e) converging distance and minimum converging distance : the converging 
distance between two codewords of different length q = Xjj — -Xj^. and 

q = Xji Xj2 Xj^^ of a VLEC code C, where Iq| = ^ > 1^1= <^ Is defined by : 

C(a c,) = H (X|,^ x,,^.,^^ ...x,^^ / -^31^32 -•^j.j ) (9) 

I;e. it is also the Hamming distance bebveen aij- length codeword and the tj - length suffix 
of a longer codeword, and the minimum converging distance of said VLEC code C is the 
minlrnum value for ail converging distances between every possible couple of C of different 
length : 

Cmin = mIn{C(Ci ,c.):cj,c. e C,jci| 9^ |c.|} (10) 

(f) free distance : the free distance dfree of a code is the minimum Hamming 
distance in the set of ail arbitrary extended codes : 

dfree = min {H(f„ fj) ; f„ fj e Fn, N = 1, 2,...., co } (11) 
Following the structure model used for a VLC, It is therefore possible to 
desc r ib e the structure of the VLEC code C by the notation : 

Si @ Li, bi ; Sz @ L2, bi; .... ; Sr @ Lr, bR ; drnm, Qnm (12) 
where there are St codewords of length Li with minimum block distance bi, for all i - 1, 2,... 
R, and minimum diverging and converging distances dmm and Qrun-The most important 
parameter of a VLEC code Is its free distance dfreer which influences greatly its performance 
In terms of error-correcting capabilities, and it can be shown that the free distance of a 
VLEC code is bounded by : 

dfree ^ rnfn (bntin/ dmin + Qnin) (13) 

These definitions being recalled, the state-of-the-art in VLEC codes 
-csnstructton wfll be now described more easily. The first type of VLEC codes, called a - 
prompt codes and introduced in 1974, and an ectension of this family, called a t,,t2^,tR - 
rrrrrrt^ codes, have both the same essential property : if one denotes by a(Q) the set of 
words that are doser to q than to any codeword q", with j 95 i, no sequence in a(ci) Is a 
prefix of a sequence in another a(q). The construction of these codes is very simple, and 
the construction algorithm Is adjustable by ttie number of codewords at each length, which 
makes possible to find the best prompt code for a given source and a given dfree* However, 
this best code performs poorly In tenns of compression performanoe. 



A mote recent construction, allowing the construction of a VLEC code ftom the 
generator matrix of a fixed-length linear block code, was proposed In the document 
"Variable-length error-correcting codes" by V.Buttigleg, Ph.D.ThesIs, Unlvereity of 
Manchester, England, 1995. Called code-andcode construction, this algorithm mlies on line 
combinations and column permutations to fbm^ an antloode at the rightmost column. Once 
the code-anbcode generator matrix is obtained, the VLEC code Is simply obtained by a 
matrix multiplication. 

This technique has however several drawbacks. Rrst^ there is no explicit 
method to find the needed line combinations and column permutations to obtain the 
anticode. Moreover, the construction does not take Into account the source statistics and, 
consequently, often reveals itself sub-optimal (one can find a code with smaller average 
length by a post-processing on the VLEC code). In the same document, the author has 
then proposed an Improved method, called Heuristic method, that is based on a computer 
search for building a VLEC code giving the better known compression rate for a specified 
source and a given protection against errors, i.e. a code C with specified overall minimum 
blocK, diverging and converging distances and with codeword lengths matched to the 
source statistics so as to obtain minimal average length for the chosen free dfetance (in 
practice, one takes b,^ =^ + = d^ee and d„,„ = [dfrea/2]. 

The main steps of this Heuristic method, which uses the following 
parameters : minimum length U of codewords, maximum length L^^x of codewoKls, free 
distance d^ee between each codeword, number s of codewords required, are now described 
wR3i reference to the ftowdiaris of Fi^.2 to 4. 

To start the computer search CStert^, all the needed parametere must be 
first specified : U (the minimal codeword length), U« (the maximum codeword length), 
the difl'erent distances between codewords (d^ d„^ q«„), and s (the number of 
codewords required by the given source), and some relations are set when choosing these 
parameters : 

U s dmh 

bmin — dfree 

rfmln + Onto = dfrea 

The first phase of the algorithm, referenced 11, Is then peribrmed : it consists 
in the generation of a fixed length code (put initially in C) of length U and minimal distance 
bm»,. This phase is In feet an initialization, performed for instance by means of an algorithm 
such as the greedy algorithm (GA), presented in Flg.5, or the majority voting algorithm 
(MVA), presented In Rg.7, or a new proposed variation, denoted by GAS (Greedy Algorithm 
by Step), which consists in a variation of tiie two above mentioned ones. The GAS consists 
in the search metiiod used in Uie GA, where instead of deleting half of ttie codewoixls, only 
tt»e last codeword of tiie group is deleted. These two algorittims are useftjl to create a set 



W of n-bft long words distant of d (in pracHce, the MVA finds more woixSs than the GA, but 
It asks too much time for only a small improvement of the compression capacity, as shown 
in the tables of Rgs.6 and 8, which compare the best code structures obtained for the 26- 
symbol English source with different values of d^^ respectively for the GA and for the 
MVA). 

The second phase of the algorithm, corresponding to the elements referenced 
21 to 24 (21+22 = operation AO ; 23+24 = operation A2), consists in storing (step 21) in a 
set called W ail the possible U - tuples distant of d^ from each codeword in C. If that set 
W of all the words satlsf/ing the minimum dlveiglng distance to the current code is not 
empty (reply NO to the test 22 : (w| » 0 ?), one e)cba bit (a "0" or a "1") is affoced at the 
end of all words in W (step 24), except if the maximum number of bits is exceeded (reply 
YES to the test 23). At the output of said step 24, this new set replacing W has twice more 
words than the previous W, and the length of each one Is Li + l. 

The third phase of the algorithm, con^sponding to the elements 31 to 35 ( = 
operabon A3), consists In deleting (step 31) all the wonte of set W that do not satisfy the 

distance (minimum converging distance) with all the codewords of C (I.e. in storing 
only those which satisfy said minimum converging distance). At this point, the new set W 
satisfies both d„,„ and c^„ distances with the VLEC code C. If that new set W Is not empty 
(reply NO to the test 32 : Jwl = 0 ?) one selects in W (step 33) the maximum number of 
words to satisfy the minimum block distance. In oixler to ensure that all the woixls of the 
set W are in fact distant of b^. At the end of this step 33, realized with the GA or the MVA 
(note that in this case, the fnital set used for the GA or the MVA Is the current W and not a 
n-tuples set), the words thus obtained are added (step 34) to the code C 

If no word is found (i.e. W is empfy) at the end of the step 21 (reply YES to 
the test 22 : (wl = 0 ?) or if the maximum number of bits is reached or exceeded (i^ly 
YES to the test 23), one enters the fourth phase of the algorithm (steps 41 to 46, 
Illustrated in Rg.3), which is used in order to unjam the process by inserting moi4 liberty of 
choice. If there are enough codewords in the last group, some of them are deleted from 
this said group, as described above. Such deletions pemiit to reduce the distance 
constraint and allow to find more codewords than before. As a matter of fact; the classical 
Heuristic method thus described begins with the maximum of codewonJs with the short: 
length, maps them with the high probability symbols and tries to obtain a good 
compression rate, but sometimes the size of the small lengths sets are incompatible with 
the required number of codewords s. In this optic, easing a few codewords provides more 
freedom degrees and alfows to reach a position where the Inifjal requirements on distance 
and number of symbols for the code can be met. This deletion process is repeated unbl it 
remains a maximum of one codeword for each length. If W Is empty at the end of the step 



31 (reply YES to the test 32 : jwj = 0 ?), the steps 23, 24, 31, 32 are repeated. If the 
required number of codewords has not been reached (reply NO to the test 35 provided at 
the end of this third phase), the steps 21 to 24 and 31 to 35 must be repeated until said 
steps find that either there is no further possibility to continue or the required number of 

5 codewords Is reached. 

If said required number of codewords has been reached (i.e. the number of 
codewords of C is equal to or greater than s (reply YES to the test 35), the structure of the 
VLEC code thus obtained Is used in a fifth part, including the steps 51 to 56 (Illustrated in 
Fig.4), In order to calculate the average length AL by weighting each codeword lengtii with 

10 the probabllfty of the source. If said average length AL of this VLEC code Is lower than the 

minimized value of AL (= ALmhO/ this AL become the ALmm and the code structure Is kept 
In memory 51- 

To continue this search of the best VLEC code. It is necessary to avoid keeping 
the same structure, which would lead to a loop In the algorithm, the last group of the • 

15 current code Is delved (steps 52, 53) and some codewords (half the amount for the GVA ; 

the "best" one for the MVA) of the previous group are deleted (step 55), in order to re-loop 
(step 56) the algorithm at the beginning of the step 21 and find different VLEC structures 
(the number of deleted codewords depends on which method is used for selecting the 
words : if the GA method Is used and one wants to obtain a linear code, it is necessary to 

20 delete half of the codewords, while with the MVA method only one codeword, the best one, 

is deleted, I.e. the one that allows to find the more codewords In the next group). 

However, the Heuristic method thus described often considers very unlikely 
code structures or proceeds with such a care (in order not to miss anything) that a great 
complicity is observed in the implementation of said method, which moreover Is rather 

25 time consuming and can thus become prohibitive. 

SUMMARY OF THE INVENTION 

It Is therefore an object of the invention to propose an Improved construction 
method with which it is possible to gain in complexity by avoiding these drawbacks. 

To this end, tiie Invention relates to a method of building a variable length 
30 error code, said method comprising the steps of : 

(1) Initializating (phase 0) the needed parameters : minimum and maximum 
length of codewords Li and Lmax respectively, free distance dfree between each codeword 
(said distance dfree being for a VLEC code C the minimum Hamming distance In the set of 
ail arbitrary extended codes), required number of codewords s ; 
35 • (2) generating (phase 1) a fixed length code C of length Li and minimal 

distance b^^, witii bmm = min {bR ; k = 1, 2, , R}, bk = the distance assodated to tiie 

codeword length U of code C and defined as the minimum Hamming distance between ail 
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codewords of C wrth length U and R = the number of different codeword lengths In Q 
said generating step creating a set W of n-blt long words distant of d ; 

(3) storing (phase 2) In the set W all the possible U - tuples distant of d^un 
from the codewords of C (said distance d^m for a VLEC code C being the minimum value of 

5 all the diverging distances between all possible couples of different-length codewords of C), 

and, if said set W Is not empty, affixing at the end of all words one extra bit, said storing 
step repladng the set W by a new one having twice more words than the previous one and 
the length of each one of these words being U + 1 ; 

(4) deleting (phase 3) all the words of the set W that do not satisfy the Cmm 
10 distance with all codewords of C, said distance q^n being the minimum converging distance 

of the code C ; 

(5) In the case where no word is found or the maximum number of bits Is 
reached, reducing (phase Al) the constraint of distance for finding more words ; 

(6) controlling that all words of the set W are distant of the found words 
15 being then added to the code C ; 

(7) if the required number of codewords has not been reached, repeating the 
steps (1) tx> (6) until the method finds either no further possibility to oonHnue or the 
required number of codewords ; 

(8) If tlie number of codewords of C is greater than s, calculating (phase A4), 
20 on the basis of the stmcture of the VLEC code, the average length AL obtained by 

weighting each codeword length witii the probability of the source, said AL becoming the 
ALnrtn. if It is lower than AL^ with AL^m = the minimum value of AL, and the corresponding 
code structure being kept in memory ; 

said building metiiod being moreover such that at most one bit is added at the end of each 
25 word of the set W. 

It Is also an object of the Invention to propose a device for carrying out said 
construction method. 

To this end, the Invention relates to a device for carrying out such a variable 
lengtii error code building method. 

30 BfUEF DESCRIPTION OF THE DRAWINGS 

The present invention will now be described, by way of ©cample, with 
reference to the accompanying drawings in which : 

- Rg.l depicts a conventional oommunication channel ; 

- Rgs, 2 to 4 are the three parts of a single flowchart Illustrating the main 
35 • steps of a conventional mettiod used for building a VLEC code, called Heuristic metiiod ; 

- Rg.5 illustrates an algorlttim (called greedy algoritiim, or GA) used for the 
Initialization of tiie metiiod of Rgs. 2 to 4, and Rg.6 is a table giving various VLEC codes 
for a source consboicted wltii the Heuristic construction using said algoritiim of Rg.5 ; 
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- Rg.7 Illustrates another algorithm (called majorfty voting algorithm, or MVA) 
used for the initlallzatfon of the method of Rgs. 2 to 4, and Rg.8 is another table giving 
various VLEC codes for a source constructed with the Heurlsttc construction using said 
algorithm of Fig.7 ; 

- Figs. 9 and 10 are the two parts of a single flowchart illustrating an 
implementation of the method according to the Invention ; 

- Rg.lX is another table giving various VLEC codes for the same 25 symbol 
English source as considered in the tables of Rgs.6 and 8 and using the GAS ; 

- Fig.l2 is another table giving various VLEC codes for the same source as in 
Rg.ll and using both the GAS previously mentioned and the building method acojrxKng to 
the Invention. 

* 

DETAILED DESCRIPTION OF THE INVENTION 

Simulations show thatv with the classical Heuristic method, almost none of the 
obtained best codes has a hole. I.e. a length jump In its structure length. It is therefore 
proposed, according to the invention, to consider tJiat nrost good codes do not have jump 
of length and therefore to reduce accordingly the set of ©«mfned VLEC codes (which 
ransequendy reduces the simulation time and the complexity of Implementation of the 
method, without modifying much the AL). Following this hypothesis, the method Is 
modified by avoiding to add more than one bit at the end of each woid of liie set W. 

The corresponding implementation (improved Heuristic consfrucHon method, 
with no hole optimization) Is illustrated in Figs.9 and 10, which show the two parts of a 
flowchart corresponding to said improved method (the elements that are identical to the 
ones observed in Rgs.2 to 4 being designated with the same references). 

The main difierences with die flowchart of Figs.Z to 4 are the following ones : 
(1) first, parts that, with respect to the classical Heuristic technique, are 
useless for the Implementation of the improved m^od have been cancelled : 

(a) tf W Is empty at the end of tJie step 31 (reply YES to the test 32 : 
jWj « 0 ?), the next phase Is now (see Rg.9) not the repetition of the steps (23, 24, 31, 
32), but the establishment (in plara of said repetition) of a direct connection 91 towards 
tiie Input of tiie drcuit canying out ttie operation 55 (deletion of some codeworels, or of the 
best one, before a repetition of tiie steps 21 to 24 and 31 to 35), said operation 55 being 
then, as previously, followed by the operations 21 and following. 

(b) the fourth phase of the method Is now reduced to one step, the 
operation 41, which is the test "Number of codewords In last group = 1 ?" . If the reply is 
NO, a direct link is established with the Input of tfie step 55 (connection 91), in view of 
carrying out said operation 55, and tiien the operations 21 and fbllowlng. If the reply is 
YES, a connection 92 is «tabllshed with the input of the set of operations 52 to 54. 
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The results obtained with the present solution (called "noHole optimization 
method") are presented in the table of Flg.ll for the 26 symbol English source when using 
the GAS method for selecting codewords. It can be seen, when comparing with results 
presented In Ffg.l2, that although the result fs not completely optimal for dfree = 3 (the 
code structure has a hole at length L =: 11), the AL rise is really acceptable when one 
considers that there is both strictly no degradation for the other dft^e values and a gain of 
time between 2,5 and 4. The same remarks can be applied when comparing the present 
solution with the ones obtained In Rg*7, where the MVA complexity effect is clear. 
Similarly, applying the noHole opHmlsatfon with the GA method for selecting codewords 
leads to a time gain at the only expense of a slight J\L rise for dftee=3. Finally, Rg.5 shows 
on the other hand that the current solution offers better AL for an acceptable gain of time, 
the noHole optimisation compensating almost entirely the complexity induced by the GAS. 



1. A method of building a variable length error code, said method comprising the 
steps of : 

(1) IniHaifcratlng (phase 0) the needed parameters : minimum and maximum 
length of codewords U and Uwx respectively, free distance dfree between each codeword 
(said distance dfree being for a VLEC code C the minimum Hamming distance in the set of 
all arbitrary extended codes), required number of codewords s ; 

(2) generating (phase 1) a fixed length code C of length U and minimal 
distance bmm/ with bmm = niin {bk ; k = 1, 2, R}, b^ = the distance associated to the 
codeword length U of code C and defined as the minimum Hamming distance between all 
codewords of C with length Lk, and R = the number of different codeword lengths in C, 
said generating step creating a set W of n-bit long words distant of d ; 

(3) storing (phase 2) in the set W all the possible U - tuples distant of d^in 
from the codewords of C (said distance dmm for a Vl^ code C being the minimum value of 
all the diverging distances between all possible couples of different-length codewords of C), 
and, if said set W is not empty, affixing at the end of all words one extra bit, said storing 
step replacing the set W by a new one having twice more words than the previous one and 
the length of each one of these words being U + 1 ; 

(4) deleting (phase 3) all the words of the set W that do not satisfy the Crnm 
distance with all codewords of C, said distance Cmm being the minimum converging distance 
of the code C ; 

(5) in the case where no word is found or the maximum number of bits Is 
reached, reducing (phase Al) the constraint of distance for finding more words ; 

(6) controlling that all words of the set W are distant of bmm, the found words 
being then added to the code C ; 

(7) if the required number of codewords has not been reached, repeating the 
steps (1) to (6) until the method finds either no furOier possibility to continue or the 
required number of codewords ; 

(8) if the number of codewords of C is greater than s, calculating (phase A4), 
on the basis of the structure of the VLEC code, the average length AL obtained by 
weighting each codeword length with the probability of the source^ said AL becoming the 
ALcda if It is lower than Al^w with =* the minimum value of AL, and the corresponding 
code structure being kept in memory ; 

said building method being moreover such that at most one bit is added at the end of each 
word of the set W. 

2. A device for carrying out a variable length error code building method 
accorcling to claim 1. 
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Abstract 

The (nvention relates to a variable-length error-correcting (VI_EC) code 
technique, in which the main steps are : defining all the needed parameters, generating a 
code having a fixed length LI, storing in a set W thus obtained all the possible Ll-tuples 
distant of the minimum diverging distance d[min] from the codewords (one extra-bit being 
affixed at the end of ail words if the new set W thus obtained is not empty), deleting all 
words of W that do not satisfy a distance criterion with all codewords, and verifying that all 
words of the final set W satisfy another distance criterion. Assuming that most good codes 
do not have jump of length, it is then proposed, according to the invention, to reduce the 
set of examined VLEC codes. Following this hypothesis, a new construction method, called 
no hole optimization. Is defined. In which, it Is avoided to add more than one bit at the end 
of each word of the set W. The new algorithm does not condder very unlikely code 
structures and thus allows to gain in complexity. 
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